Constituency Parser for Hindi Noun Sequences and Role of Bracketing in Translation of English Compound Nouns into Hindi

نویسندگان

  • Dipti Misra Sharma
  • Lakshmi Bai
  • Radhika Mamidi
  • Prashanth Mannem
  • Ankush Soni
  • Bhasha Agrawal
  • Himanshu Sharma
  • Karan Singla
  • Kshitij Mishra
  • Kunal Sachdeva
  • Naman Jain
  • Praveen Dakwale
  • Rahul Sharma
  • Rishabh Srivastava
  • Sambhav Jain
  • Sruti Rallapalli
چکیده

Complex noun sequences in Hindi can be formed by the sequences of nouns and genitives. In Hindi, the genitive marker is “kā”, and its allomorphic variations are “ke” and “kī”. When two or more nouns occur without any intervening post-positions, it is known as compound noun. Following are some examples of complex noun sequences: (1) “jilā cunāva adhikārī” (district election officer), (2) “tila kī mit.hāī kī dukāna” (shop of sweets made with sesame) and (3) “upabhoktā adālata ke vakīla” (consumer court’s lawyer). The rightmost noun is the head of the whole construction. The inner structure of the sequence can be quite complex. In it, (a) nouns within the sequence can modify the rightmost head or (b) the local head can modify another local head or the head of the complex noun sequence. For example, in (1), “adhikārī” is the head and both “jilā” and “cunāva” are modifying “adhikārī” thus having a structure (jilā (cunāva adhikārī)). But, the complex sequence in (2) has a structure where “tila” modifies “mit.hāī” and “mit.hāī” in turn modifies “dukāna”. So the structure is ((tila kī mit.hāī) kī dukāna). More number of nouns within a sequence, more complex is the structure. From the Hindi Treebank data, we have obtained 85.37%, 12.54% and 1.80% of the sequences having three, four and five nouns respectively. In this thesis, we attempt to bracket the local sub-structure of a complex noun sequence which is termed as constituency parsing. Constituency parsing recursively builds the inner structure of the complex noun sequence. It is a very significant NLP task because the interpretation of sequence depends on the correct identification of its inner structure. We explore both syntactic and statistical method for predicting the bracketing of the complex noun sequences. In Hindi, the genitive marker agrees with the head of the sub-sequence modified by it. This clue has been used in our syntactic approach. In statistical approach, we have mainly exploited the affinity factor of a head and its modifier based on the frequency of occurring together in the corpus. The method has been augmented by introducing the semantic class information for the head and modifier nouns from Hindi WordNet. Finally, we combine the two methods and implement a hybrid approach for bracketing complex noun sequences. Using this, we have obtained 85.85% accuracy. In this thesis, we show that the identification of the inner structure of complex noun sequence helps in determining the translation. For this experiment, we take three-word noun compounds of English and translate them into Hindi. The strategy of the translation is determined by our observation of EnglishHindi parallel corpora where we observe (and others have reported also) that English licenses multiword noun compound more frequently than what Hindi does. Hindi prefers syntactic phrases where a genitive post-position is inserted between the head and the modifier. In the case of compounds with three

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Neural Network based Approach for English to Hindi Machine Translation

In this paper we are discussing the working of our English to Hindi Machine Translation system. Our system is able to translate English language’s simple sentences into Hindi. This system has been implemented using feed-forward backpropagation artificial neural network. ANN model does the selection of translation rules for grammar structure and Hindi words/tokens (such as verb, noun/pronoun etc...

متن کامل

Hybrid Approach for Hindi to English Transliteration System for Proper Nouns

s Abstract— In this paper hybrid approach is presented to transliterate proper nouns written in Hindi language into its equivalent English language. Hybrid approach means combination of direct mapping, rule based approach and statistical machine translation approach. Transliteration is a process to generate the words from the source language to the target language. The reverse process is known ...

متن کامل

A Hybrid Approach for Bracketing Noun Sequence

For a resource poor language like Hindi, it becomes very difficult to bracket a noun sequence using approaches which are only based on corpus or lexical database. For semantic knowledge, power of both type of resources is needed to be combined. Therefore, affinity in between two nouns is preferred to be measured using backoff association which is the combination of lexical and conceptual associ...

متن کامل

A System for Compound Noun Multiword Expression Extraction for Hindi

Compound noun multiword expressions are important for many NLP applications like machine translation and information retrieval. This paper describes a system for Hindi compound noun multiword expressions (MWE) extraction from a given corpus. We identify major categories of compound noun MWEs, based on linguistic and psycholinguistic principles. Our extraction methods use various statistical co-...

متن کامل

Syntactic Construct : An Aid for translating English Nominal Compound into Hindi

This paper illustrates a way of using paraphrasal interpretation of English nominal compound for translating them into Hindi. Input Nominal compound is first paraphrased automatically with the 8 prepositions as proposed by Lauer (1995) for the task. English prepositions have one-to-one mapping to post-position in Hindi. The English paraphrases are then translated into Hindi using the mapping sc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017